Cantor's diagonal argument: There are more real than natural numbers

\( \newcommand{\water}{{\rm H_{2}O}} \newcommand{\R}{\mathbb{R}} \newcommand{\N}{\mathbb{N}} \newcommand{\Z}{\mathbb{Z}} \newcommand{\Q}{\mathbb{Q}} \newcommand{\E}{\mathbb{E}} \newcommand{\d}{\mathop{}\!\mathrm{d}} \newcommand{\grad}{\nabla} \newcommand{\T}{^\text{T}} \newcommand{\mathbbone}{\unicode{x1D7D9}} \renewcommand{\:}{\enspace} \DeclareMathOperator*{\argmax}{arg\,max} \DeclareMathOperator*{\argmin}{arg\,min} \DeclareMathOperator{\Tr}{Tr} \newcommand{\norm}[1]{\lVert #1\rVert} \newcommand{\KL}[2]{ \text{KL}\left(\left.\rule{0pt}{10pt} #1 \; \right\| \; #2 \right) } \newcommand{\slashfrac}[2]{\left.#1\middle/#2\right.} \)

Both \(\; \R \;\) and \(\; \N \;\) have infinite elements, but \(\; \R \;\) has more elements than \(\; \N \;\), \(\; |\R| > |\N| \;\). Therefore, there are different types of infinity!

We may suspect this if we think of aligning all the naturals against all the reals. Clearly, if we look at one finite interval, as we grow the interval the amount of reals will grow much faster than the number of natural numbers (in fact, for any finite interval, the amount of natural numbers will be finite but the amount of reals will already be infinite). However, this doesn't directly tell us about all the naturals vs all the reals, at least in a irrefutable way. Someone may argue that if we "squeeze" all the naturals together we can "fill" the real line, and there would be enough naturals to perform this operation because there are infinite naturals, same as there are infinite natural numbers.

Cantor's diagonal argument is a definite proof that there are more real numbers than natural numbers. It shows that, if we attempt to pair all real numbers in an interval (e.g. the \(\; [0,1] \;\) interval) with all natural numbers, such that we can assign one real number to every one natural number, there will always be real numbers that are left out, because there are more real numbers than there are natural numbers.

The pairing will be the function \(\; f: \N \to [0,1] \;\), on which we make no assumptions, so what we deduce about it will be applicable to all pairings or functions.

Suppose we use the function \(\; f \;\) to write an infinitely long table of correspondences between \(\; \N \;\) and \(\; [0,1] \;\). To the left we have all the natural numbers, and to the right we have the corresponding real numbers in \(\; [0,1] \;\) (in decimal representation):

\[ \begin{array}{c|c c c c c c c c c c c} n & f(n) & & & & & & & & & & & \\ \hline 1 & 0 & . & 3 & 7 & 4 & 0 & 8 & 6 & 6 & 5 & 1 & \ldots \\ 2 & 0 & . & 6 & 2 & 1 & 9 & 7 & 5 & 3 & 2 & 2 & \ldots \\ 3 & 0 & . & 4 & 9 & 0 & 1 & 2 & 1 & 5 & 7 & 6 & \ldots \\ 4 & 0 & . & 8 & 7 & 1 & 0 & 2 & 3 & 8 & 1 & 2 & \ldots \\ 5 & 0 & . & 2 & 3 & 7 & 8 & 0 & 1 & 5 & 0 & 7 & \ldots \\ 6 & 0 & . & 1 & 4 & 0 & 5 & 9 & 8 & 3 & 2 & 6 & \ldots \\ \vdots & \vdots & & & & & & & & & & & \\ \end{array} \]

A priori it may seem that all the reals in \(\; [0,1] \;\) could be included in some row in the right column. For example, look at the real in the diagonal. That real, \(\; {\color{red} 0.69189\ldots} \;\), could perhaps be included in the table.

\[ \begin{array}{c|c c c c c c c c c c c} n & f(n) & & & & & & & & & & & \\ \hline 1 & \color{red}{0} & . & 3 & 7 & 4 & 0 & 8 & 6 & 6 & 5 & 1 & \ldots \\ 2 & 0 & . & \color{red}{6} & 2 & 1 & 9 & 7 & 5 & 3 & 2 & 2 & \ldots \\ 3 & 0 & . & 4 & \color{red}{9} & 0 & 1 & 2 & 1 & 5 & 7 & 6 & \ldots \\ 4 & 0 & . & 8 & 7 & \color{red}{1} & 0 & 2 & 3 & 8 & 1 & 2 & \ldots \\ 5 & 0 & . & 2 & 3 & 7 & \color{red}{8} & 0 & 1 & 5 & 0 & 7 & \ldots \\ 6 & 0 & . & 1 & 4 & 0 & 5 & \color{red}9 & 8 & 3 & 2 & 6 & \ldots \\ \vdots & \vdots & & & & & & & & & & & \\ \end{array} \]

Now, let's try to find a number that cannot be on the table, under any circumstances. Take the real on the diagonal and add an offset between 1 and 9 to each of its digits. You will end up with a real that differs from the first real in the table in the first digit, from the second real in the table in the second digit, from the third real in the table in the third digit, and so on. For example, with an offset of 1, you would end up with \(\; {\color{red} 0.70290\ldots} \;\).

Since the real above differs in at least one digit from every real in the table, it is not contained in the table. So for any pairing function \(\; f: \N \to [0,1] \;\), there will always be reals in \(\; [0,1] \;\) that are not in the range of \(\; f \;\). This means that even after we have paired up all the naturals in \(\; \N \;\), there will be reals in \(\; [0,1] \;\) that will be left out. Therefore, there are more reals than there are naturals!